Person tracking method and apparatus using robot

ABSTRACT

A person tracking method and apparatus using a robot. The person tracking method includes: detecting a person in a first window of a current input image using a skin color of the person; and setting a plurality of second windows in a next input image, correlating the first window and the second windows and tracking the detected person in the next input image using the correlated results.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of Korean Patent Application No.2004-0066396, filed on Aug. 23, 2004, in the Korean IntellectualProperty Office, the disclosure of which is incorporated herein byreference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a person tracking method and apparatususing a robot, and more particularly to a method and an apparatus fordetecting a person from an input image and tracking the motion of thedetected person using a robot.

2. Description of Related Art

Recently, a robot has been spotlighted as a system to replace humans forsimple tasks in a home or in a place hard to access in person.Currently, the function of the robot is only to perform simple repeatedtasks. A prerequisite for performing more intelligent works is aninteraction with the person who employs the robot. For smoothinteraction, the robot needs to be able to locate and track the user sothat it stays in the vicinity of the user.

One way a robot can locate and track a user is by face detection. Mostexisting face detecting devices locate a person indoors or outdoorsusing a method of storing a background image and then detecting motionof the person using a difference image obtained by subtracting thebackground image from the input image, or a method of tracking thelocation of the person using only shape information. The method usingthe difference image between the input image and the background image isvery efficient in a case of using a fixed camera, but not for acontinuously moving camera arranged in a robot, because the backgroundimage continuously changes. On the other hand, the method using theshape information of the person takes a long time to locate the personby matching a plurality of model images, similar to a person shape, tothe whole input image.

BRIEF SUMMARY

An aspect of the present invention provides a method and apparatus fordetecting a person using a skin color from an input image and trackingthe detected person.

According to an aspect of the present invention, there is provided aperson tracking method including detecting a person in a first window ofa current input image using a skin color of the person; and setting aplurality of second windows in a next input image, correlating the firstwindow and the second windows and tracking the detected person in thenext input image using the correlated results.

According to another aspect of the present invention, there is provideda person tracking apparatus including: an image input unit which outputscontinuous images; a person detecting unit which detects a person from acurrent input image in a first window using a skin color of the person;and a tracking unit which sets a plurality of second windows in a nextinput image following the current input image, correlates the firstwindow and the second windows and tracks the detected person in the nextinput image using the correlated results.

According to another aspect of the present invention, there is provideda computer-readable storage medium encoded with processing instructionsfor causing a processor to perform a person tracking method including:detecting a person in a first window of a current input image using askin color of the person; and setting a plurality of second windows in anext input image, correlating the first window and the second windowsand tracking the detected person in the next input image using thecorrelated results.

According to another aspect of the present invention, there is provideda robot, including: an image input unit receiving an image andoutputting a captured image; a person detecting unit detecting a personin the captured image using a skin color of the person; a trackingobject determining unit selecting a detected person in the capturedimage as a tracking object; and a tracking unit moving the robot alocation near the observation object and tracking the observation objectat the location.

Additional and/or other aspects and advantages of the present inventionwill be set forth in part in the description which follows and, in part,will be obvious from the description, or may be learned by practice ofthe invention

BRIEF DESCRIPTION OF THE DRAWINGS

The above and/or other aspects and advantages of the present inventionwill become apparent and more readily appreciated from the followingdetailed description, taken in conjunction with the accompanyingdrawings of which:

FIG. 1 is a block diagram of a person tracking apparatus according to anembodiment of the present invention;

FIG. 2 is a flowchart illustrating a person tracking method according toan embodiment of the present invention;

FIG. 3 is a detailed flowchart illustrating a person detectingoperation;

FIG. 4A illustrates an input image and FIG. 4B illustrates the imagewith RGB colors normalized;

FIG. 4C illustrates regions that are detected as candidate regions fromthe input image;

FIG. 5A illustrates an example of the first normalization for acandidate region on the basis of the centroid of the candidate region;

FIGS. 5B, 5C and 5D illustrate examples of the normalized input images;

FIGS. 6A, 6B and 6C respectively show Mahalanobis distance maps forFIGS. 5B, 5C and 5D;

FIGS. 7A through 7D schematically illustrate a process of detectingpersons from an input image;

FIG. 8 is a flowchart of a particle filter method;

FIG. 9A illustrates a normalization window image at time (t-1) and FIG.9B illustrates the normalization window images obtained by centeringaround each sample; and

FIG. 10, parts (a)-(h), illustrates a process of tracking a movingperson.

DETAILED DESCRIPTION OF EMBODIMENTS

Reference will now be made in detail to embodiments of the presentinvention, examples of which are illustrated in the accompanyingdrawings, wherein like reference numerals refer to the like elementsthroughout. The embodiments are described below in order to explain thepresent invention by referring to the figures.

FIG. 1 is a block diagram of a person tracking apparatus according to anembodiment of the present invention. The person tracking apparatusincludes an image input unit 13, a person detecting unit 10, a trackingobject determining unit 11 and a tracking unit 12.

The image input unit 13 outputs an image captured by a photographingarrangement (not shown), and can be any type of camera which canphotograph a moving person. The person detecting unit 10 detects theperson using a skin color of the person from the image input from theimage input unit 13. When multiple persons are detected in the persondetecting unit 10, the tracking object determining unit 11 determines(i.e., selects) one of the detected persons, for example, a detectedperson who is the nearest to the centroid of the detected persons in theimage as a tracking object. If the tracking object is determined, therobot approaches to a certain distance from the tracking object usingthe location and distance information of the tracking object.

The operation of the person tracking apparatus illustrated in FIG. 1will now be described in detail with reference to the flowchart in FIG.2.

First, the person detecting unit 10 detects the person from the inputimage (operation 20). FIG. 3 is a detailed flowchart illustrating theperson detecting operation 20. Referring to FIG. 3, first, colorinformation of the input image is converted (operation 30). The colorinformation conversion is to reduce the effect of the illuminationincluded in the input image and emphasize skin color regions. RGB (Red,Green, and Blue) colors of the input image are converted into anormalized rgb domain as shown by equation (1). $\begin{matrix}{{r = \frac{R}{R + G + B}},{g = \frac{G}{R + G + B}},{b = \frac{B}{R + G + B}}} & (1)\end{matrix}$

FIG. 4A illustrates an input image and FIG. 4B illustrates the inputimage with RGB colors normalized.

Next, Gaussian modeling is performed as shown by equation (2) on an rgbimage using averages (m_(r), m_(g)) of colors r and g and standarddeviations (σ_(r), σ_(g)) of a plurality of skin color models. Regionswhere the modeled values are greater than a specified threshold value,for example 240, are detected as candidate regions for the skin color(operation 31). $\begin{matrix}\begin{matrix}{{Z\left( {x,y} \right)} = {G\left( {{r\left( {x,y} \right)},{g\left( {x,y} \right)}} \right)}} \\\left. {= {\frac{1}{2\quad\pi\quad\sigma_{r}\sigma_{g}}{\exp\left\lbrack {{- \frac{1}{2}}\left\{ {\left( \frac{{r\left( {x,y} \right)} - m_{r}}{\sigma_{r}} \right)^{2} + \left( \frac{{g\left( {x,y} \right)} - m_{g}}{\sigma_{g}} \right)^{2}} \right\}} \right\rbrack}}} \right\}\end{matrix} & (2)\end{matrix}$

FIG. 4C illustrates regions that are detected as candidate regions fromthe input image. The image is binarized so that the candidate regionsare expressed by white and other regions are expressed by black.

In operations 32 and 33, a gray image and an edge image for the inputimage are obtained, respectively. The edge image can be obtained by theSobel edge detecting method or the Canny edge detecting method.

The regions corresponding to the candidate regions detected in operation31 are extracted from the gray image and the edge image, and arenormalized using the centroid and size information of the candidateregions (operation 34).

FIG. 5A illustrates an example of the first normalization for acandidate region on the basis of a centroid 50 of the candidate region.Each candidate region is normalized so that its height is greater thanits width after obtaining a square of size of a×a centered around thecentroid 50. For example, the candidate region is normalized by (2×a)respectively in left and right sides from the centroide 50, totally2×(2×a) in width, and (2×a) upward and (3.5×a) downward from thecentroid 50, totally 2×a+3.5×a in height. Here, a may be the square rootof the size information, that is, √{square root over (size)}.

A second normalization is performed for each skin color region that issubject to the primary normalization. The second normalization isperformed through bilinear interpolation. FIGS. 5B, 5C and 5D illustrateexamples of the secondly normalized images.

Next, the normalized gray images are used to determine whether any ofthe candidate regions includes the person (operation 35). Thedetermination is performed by applying an SVM (Support Vector Machine)to the normalized gray images. This process is described in more detail.First, a Mahalanobis distance for the normalized gray image is obtainedin a block unit, each block having a size of p×q.

The average of the pixel values of each block can be obtained usingequation (3). $\begin{matrix}{{\overset{\_}{x}}_{l} = {\frac{1}{pq}{\sum\limits_{{({x,t})} \in X_{t}}x_{s,t}}}} & (3)\end{matrix}$Here, p and q respectively denote the number of horizontal and verticalpixels for each block, {overscore (x)}_(I) denotes the average of pixelvalues in a block, and x_(s,t) denotes a pixel value included in theblock.

On the other hand, a variance of each block can be expressed as equation(4). $\begin{matrix}{\sum\limits_{i}{= {\frac{1}{pq}{\sum\limits_{{({x,t})} \in X_{t}}{\left( {x_{s,t} - {\overset{\_}{x}}_{l}} \right)\left( {x_{s,t} - {\overset{\_}{x}}_{l}} \right)^{T}}}}}} & (4)\end{matrix}$Here, T denotes a transpose.

Using the average and the variance of each block, the Mahalanobisdistance d_((i,j)) and Mahalanobis distance map D can be obtained usingequations (5) and (6), respectively. $\begin{matrix}{d_{({i,j})} = {\left( {{\overset{\_}{x}}_{i} - {\overset{\_}{x}}_{j}} \right)^{T}\left( {\sum\limits_{i}{+ \sum\limits_{j}}} \right)^{- 1}\left( {{\overset{\_}{x}}_{i} - {\overset{\_}{x}}_{j}} \right)}} & (5) \\{D = \begin{bmatrix}0 & d_{({1,2})} & \ldots & d_{({1,{MN}})} \\d_{({2,1})} & 0 & \ldots & d_{({2,{MN}})} \\\vdots & \vdots & \vdots & \vdots \\d_{({{MN},1})} & d_{({{MN},2})} & \ldots & 0\end{bmatrix}} & (6)\end{matrix}$

Here, M and N respectively denote the number of the horizontal andvertical blocks for the normalized gray image. If a region having a sizeof 30×40 in the normalized gray image is divided into blocks having asize of 5×5, the Mahalanobis distance map D becomes a 48×48 matrix.

FIGS. 6A, 6B and 6C respectively show Mahalanobis distance maps forFIGS. 5B, 5C and 5D. As shown in FIGS. 6A, 6B and 6C, the gray imagecorresponding to the face of the person shows symmetry centered around adiagonal line from the top left corner to the bottom right corner.However, the gray image without the face of the person is notsymmetrical.

Because an SVM can be trained in advance to recognize the facial imageof a person, the SVM is trained to determine whether an image is thefacial image of the person by obtaining the Mahalanobis distance mapaccording to equation (6) for the image normalized on the basis of eachskin color region.

Accordingly, by applying the Mahalanobis distance map obtained from thenormalized gray image for the input image to the SVM, it is determinedwhether the image contains the person's face.

Similarity between a normalized edge image and a model image of theperson is determined through obtaining a Hausdorff distance (operation36). Here, the model image of the person means the edge image for atleast one model image. One or more of the edge images of the modelimages may be stored where the model images contain persons facing afront side, a specified angled left side and right side.

The Hausdorff distance is obtained by calculating the Euclideandistances between every feature point of the model image and one featurepoint in the edge image and between one feature point of the edge imageand every feature point in the model image are obtained as thefollowing. That is, if the edge image A is composed of m feature points(pixels) and the model image B is composed of n feature points (pixels),the Hausdorff distance H(A, B) can be expressed by equation (7).Here $\begin{matrix}{{{H\left( {A,B} \right)} = {\max\left( {{h\left( {A,B} \right)},{h\left( {B,A} \right)}} \right)}}{{h\left( {A,B} \right)} = {\max\limits_{a \in A}{\min\limits_{b \in B}{{a - b}}}}}{A = \left\{ {a_{1},\ldots\quad,a_{m}} \right\}}{B = {\left\{ {b_{1},\ldots\quad,b_{n}} \right\}.}}} & (7)\end{matrix}$

In detail, h(A, B) is obtained by selecting the maximum value from theminimum values for m feature points (pixels) of the input edge image Awhere each of the minimum values is the minimum among Euclideandistances between a feature point (pixel) of the input edge image A andevery feature points (pixels) of the model image B. On the contrary,h(B, A) is obtained by selecting the maximum value from the minimumvalues for n feature points (pixels) of the model image B where each ofthe minimum values is the minimum among Euclidean distances between afeature point (pixel) of the model image B and every feature points ofthe input edge image A. H(A, B) is determined as the maximum of h(A, B)and h(B, A). From the value of H(A, B), it can be known that how muchthe input edge image mismatches with the model image. The Hausdorffdistances between the input model image and every model image, forexample, the front model image, the left model image and the right modelimage are calculated, and the maximum value among the Hausdorffdistances is output as a final Hausdorff distance. The final Hausdorffdistance H(A, B) is compared with a specified threshold value. If theHausdorff distance H(A, B) is less than the threshold value, thecorresponding candidate region is determined to contain the person,otherwise the corresponding candidate region is determined to be thebackground.

Using the SVM of operation 35 and the Hausdorff distance calculatingresult of operation 36, the persons in the input image are finallydetected (operation 37).

FIGS. 7A through 7D schematically exemplifies a process of detecting thepersons from an input image. The color information of the input image ofFIG. 7A is converted, and person candidate regions illustrated in FIG.7B are detected from the converted color information. Also, a gray imageand an edge image of FIG. 7C are obtained from the input image of FIG.7A. In the gray image and the edge image, the normalized images having asize of 30×40 pixels are obtained for each of the detected personcandidate regions. It is determined whether each of the normalized grayimages contains a person using a Mahalanobis distance map and SVM. TheHausdorff distances between the normalized edge images and the modelimages for the front, left and right sides of the face are obtained, andcompared with a threshold value to determine whether each of thenormalized edge images contains a person's face. FIG. 7D illustrates thedetected multiple persons.

Returning to FIG. 2, when multiple persons are detected in the inputimage (operation 21), the tracked object determining unit 11 of FIG. 1determines a tracking object as follows (operation 22). First, a centervalue for the horizontal axis of the input image is determined. Forexample, if the input image has a size of 320×240, the center pixel ofthe horizontal axis is 160. Next, a location and an amplitude of thecentroid for the detected persons are obtained. The location and theamplitude of the centroid can be obtained by averaging locations of theskin color pixels of FIG. 7B corresponding to the detected persons bythe number of the skin color pixels. A person who is closest to theobtained location of the centroid in the horizontal axis of the image isdetermined to be an observation object. For example, among the detectedpersons of FIG. 7D, the person represented by a reference numeral 70 maybe the observation object.

When the observation object is determined, the robot is moved to acertain location in the vicinity of the observation object (operation23) and begins to track the observation object at that location(operation 24). The tracking is performed using a particle filter methodas illustrated in FIG. 8.

In the tracking of the present embodiment, a location of a person for aspecified measurement value can be expressed by a probability. If asample set at time t is expressed by {8_(t) ^((n)), n=1, . . . , N} anda weight, a state density, of a sample is expressed by π_(t) ^((n)), theweighted sample set for posterior p(x_(t-1)|Z_(t-1)) at time (t-1) canbe expressed by {(8_(t-1) ^((n)),π_(t-1) ^((n))), n=1, . . . , N} inoperation 80. Here, Z_(t-1) is a feature value measured at time (t-1).In operation 81, N samplings are performed from {8_(t-1) ^((n))} togenerate {{acute over (8)}_(t) ^((n))} and the generated samples undergodrift. The sampling is performed with reference to π_(t-1) ^((n)). Thatis, several samplings are performed on a sample having a high weight,and samplings may be not performed on a sample having a relatively lowweight.

The drift is determined according to a specified dynamics reflecting aconditional density function p(x_(t)|x_(t-1)={acute over (8)}_(t)^((n))) so that a new state of the sample is directly influenced by theprevious state. The drifted samples are diffused in operation 82 so thatthe sample set {8_(t) ^((n))} at time t is generated. The sample valueis determined by a weighted sum of a vector of the standard normalrandom variates and the drifted sample.

The weights of the samples are determined by each of observationdensities p(z_(t)|x_(t)) at each sample location as equation (8)(operation 83). $\begin{matrix}{{\pi_{t}^{(n)} = {p\left( {\left. z_{t} \middle| x_{t} \right. = 8_{t}^{(n)}} \right)}}{{\sum\limits_{n}\pi_{t}^{(n)}} = 1}} & (8)\end{matrix}$

According to the above-mentioned process, in operation 84, the weightedsample set {(8_(t) ^((n)), π_(t) ^((n)))} at time t is obtained.

In the present embodiment, the similarity based on color histogram isused as an observation feature. The similarity will be described indetail.

The similarity is determined by a correlation between an image m₁ of anormalization window at time (t-1) and an image m₂ of a normalizationwindow determined by centering around each sample at time t. FIG. 9Aillustrates the normalization window image at time (t-1) and FIG. 9Billustrates the normalization window image obtained by centering aroundeach sample. The similarity between the images of FIG. 9A and FIG. 9B iscalculated, and a position of a sample of the window image having amaximum similarity is determined to be a tracking location in a currentframe. The similarity is determined by equation (9). The size of thenormalized image is (2n+1)×(2m+1). $\begin{matrix}{{{likehood}\left( {m_{1},m_{2}} \right)} = \frac{\begin{matrix}{\sum\limits_{i = {- n}}^{n}{\sum\limits_{i = {- m}}^{m}{\left\lbrack {{I_{1}\left( {{u_{1} + i},{v_{1} + j}} \right)} - \overset{\_}{I_{1}\left( {u_{1},v_{1}} \right)}} \right\rbrack \times}}} \\\left\lbrack {{I_{2}\left( {{u_{2} + i},{v_{2} + j}} \right)} - \overset{\_}{I_{2}\left( {u_{2},v_{2}} \right)}} \right\rbrack\end{matrix}}{\left( {{2n} + 1} \right)\left( {{2m} + 1} \right)\sqrt{{\sigma^{2}\left( I_{1} \right)} \times {\sigma^{2}\left( I_{2} \right)}}}} & (9)\end{matrix}$

Here, I₁ and I₂ are color histograms of m₁ and m₂, respectively, and(u₁,v₁) and (u₂,v₂) are central pixel locations of m₁ and m₂,respectively.

In equation 9, an average color histogram {overscore (I_(k)(u,v))} of m₁and m₂ and the variance thereof σ(I_(k)) are calculated using equation(10). $\begin{matrix}{{\overset{\_}{I_{k}\left( {u,v} \right)} = {\sum\limits_{i = {- n}}^{n}{\sum\limits_{j = {- m}}^{m}{{I_{k}\left( {{u + i},{v + j}} \right)}/\left\lbrack {\left( {{2n} + 1} \right)\left( {{2m} + 1} \right)} \right\rbrack}}}}{{\sigma\left( I_{k} \right)} = \sqrt{\frac{\sum\limits_{i = {- n}}^{n}{\sum\limits_{j = {- m}}^{m}{I_{k}^{2}\left( {u,v} \right)}}}{\left( {{2n} + 1} \right) \times \left( {{2m} + 1} \right)} - \overset{\_}{I_{k}\left( {u,v} \right)}}}} & (10)\end{matrix}$

Next, a CDF of the current sample is obtained and operations 80 through84 for the next frame are repeated.

Parts (a)-(h) of FIG. 10 illustrate a process of tracking a person whowalks to and sits on a chair. As shown in FIG. 10, the person trackingis performed well.

The above-described embodiments of the present invention can also beembodied as computer-readable code on a computer-readable storagemedium. A computer-readable storage medium is any data storage devicethat can store data which can be thereafter read by a computer system.Examples of a computer-readable storage medium include read-only memory(ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppydisks, optical data storage devices, and carrier waves (such as datatransmission through the internet). The computer-readable storage mediumcan also be distributed over network coupled computer systems so thatthe computer readable code is stored and executed in a distributedfashion.

According to the above-described embodiments of the present invention, aperson can be detected and tracked, regardless of the facing directionof the person, by detecting persons from an image using the skin colorand shape information of the persons, determining a tracking object, andtracking the tracking object using the color histogram information.

Furthermore, using the correlation between the normalized image of theprevious frame and the normalized images centered around each sample, amotion of the person can easily be detected.

The above-described embodiments of the present invention cancontinuously track and monitor a determined person. Accordingly, sincepictures can be taken continuously while tracking the specified specificperson, the present invention can be applied to an unmanned camera of abroadcasting station.

Moreover, the intelligence of household electric appliances can beaccomplished using the location and distance information of the person.

Although a few embodiments of the present invention have been shown anddescribed, the present invention is not limited to the describedembodiments. Instead, it would be appreciated by those skilled in theart that changes may be made to these embodiments without departing fromthe principles and spirit of the invention, the scope of which isdefined by the claims and their equivalents.

1. A person tracking method comprising: detecting a person in a firstwindow of a current input image using a skin color of the person; andsetting a plurality of second windows in a next input image, correlatingthe first window and the second windows and tracking the detected personin the next input image using the correlated results.
 2. The persontracking method as claimed in claim 1, wherein the second windows areset to a size equal to that of the first window in the next input imagecentered around the locations of first samples selected in the nextinput image.
 3. The person tracking method as claimed in claim 2,wherein selecting the first samples comprises: sampling the firstsamples centered around each of second samples included in the firstwindow with weights of the second samples reflected to the sampling; andreflecting the locations of the second samples to drift the firstsamples and reflecting the drifted locations of the first samples todetermine the locations of the first samples.
 4. The person trackingmethod as claimed in claim 1, wherein the correlation is calculated byan equation${{{similarity}\left( {m_{1},m_{2}} \right)} = \frac{\begin{matrix}{\sum\limits_{i = {- n}}^{n}{\sum\limits_{i = {- m}}^{m}{\left\lbrack {{I_{1}\left( {{u_{1} + i},{v_{1} + j}} \right)} - \overset{\_}{I_{1}\left( {u_{1},v_{1}} \right)}} \right\rbrack \times}}} \\\left\lbrack {{I_{2}\left( {{u_{2} + i},{v_{2} + j}} \right)} - \overset{\_}{I_{2}\left( {u_{2},v_{2}} \right)}} \right\rbrack\end{matrix}}{\left( {{2n} + 1} \right)\left( {{2m} + 1} \right)\sqrt{{\sigma^{2}\left( I_{1} \right)} \times {\sigma^{2}\left( I_{2} \right)}}}},{and}$wherein I₁ and I₂ are color histograms of the first window and thesecond window, respectively, (u₁,v₁), (u₂,v₂) are central pixellocations of the first window and the second window, respectively, and(2n+1)×(2m+1) is a size of the first window or the second window.
 5. Theperson tracking method as claimed in claim 1, further comprising:obtaining a horizontal center of the current input image, when multiplepersons are detected in detecting the person; and determining as atracking object a person closest to the horizontal center among thedetected persons.
 6. The person tracking method as claimed in claim 1,wherein the detecting the person comprises: detecting at least one skincolor region using skin color information for the current input image;determining whether each detected skin color region corresponds to aperson candidate region; and determining whether each detected skincolor region that is determined to be the person candidate regionmatches the person using shape information of the person.
 7. The persontracking method as claimed in claim 6, wherein detecting the at leastone skin color region comprises: normalizing colors of each pixel of thecurrent input image; performing a Gaussian modeling for the normalizedimage to emphasize pixels of a color similar to the skin color; andbinarizing pixels whose modeled values are greater than a specifiedthreshold value among the emphasized pixels to detect the at least oneskin color region.
 8. The person tracking method as claimed in claim 6,wherein determining whether each detected skin color region correspondsto the person candidate region comprises: normalizing a size of eachdetected skin color region to a size of the first window; anddetermining whether each normalized skin color region is the personcandidate region.
 9. The person tracking method as claimed in claim 8,wherein determining whether each normalized skin color region is theperson candidate region comprises: obtaining a gray image for the firstwindow; dividing the gray image into a plurality of blocks and obtainingMahalanobis distances between the blocks to generate a Mahalanobisdistance map; and determining whether each normalized skin color regionis the person candidate region using the generated Mahalanobis distancemap.
 10. The person tracking method as claimed in claim 8, whereindetermining whether each normalized skin color region is the personcandidate region comprises: obtaining an edge image for the firstwindow; obtaining a similarity between the obtained edge image and edgeimages of model images; and determining whether each normalized skincolor region is a person candidate region according to the similarity.11. The person tracking method as claimed in claim 10, wherein thesimilarity is measured by a Hausdorff distance.
 12. The person trackingmethod as claimed in claim 10, wherein the model images include at leastone of a front model image, a left model image, and a right model image.13. A computer-readable storage medium encoded with processinginstructions for causing a processor to perform a person trackingmethod, the method comprising: detecting a person in a first window of acurrent input image using a skin color of the person; and setting aplurality of second windows in a next input image, correlating the firstwindow and the second windows and tracking the detected person in thenext input image using the correlated results.
 14. A person trackingapparatus comprising: an image input unit which outputs continuousimages; a person detecting unit which detects a person from a currentinput image in a first window using a skin color of the person; and atracking unit which sets a plurality of second windows in a next inputimage following the current input image, correlates the first window andthe second windows and tracks the detected person in the next inputimage using the correlated results.
 15. The person tracking apparatus asclaimed in claim 14, wherein the second windows are set to a size equalto that of the first window in the next input image centered on thelocations of first samples selected in the next input image.
 16. Theperson tracking apparatus as claimed in claim 15, wherein the firstsamples are selected from second samples included in the first windowusing a particle filter.
 17. The person tracking apparatus as claimed inclaim 14, wherein the correlation is calculated by an equation${{{similarity}\left( {m_{1},m_{2}} \right)} = \frac{\sum\limits_{i = {- n}}^{n}{\sum\limits_{i = {- m}}^{m}{\left\lbrack {{I_{1}\left( {{u_{1} + i},{v_{1} + j}} \right)} - \overset{\_}{I_{1}\left( {u_{1},v_{1}} \right)}} \right\rbrack \times \left\lbrack {{I_{2}\left( {{u_{2} + i},{v_{2} + j}} \right)} - \overset{\_}{I_{2}\left( {u_{2},v_{2}} \right)}} \right\rbrack}}}{\left( {{2n} + 1} \right)\left( {{2m} + 1} \right)\sqrt{{\sigma^{2}\left( I_{1} \right)} \times {\sigma^{2}\left( I_{2} \right)}}}},$wherein I₁ and I₂ are color histograms of the first window and thesecond window, respectively, (u₁,v₁), (u₂,v₂) are central pixellocations of the first window and the second window, respectively, and(2n+1)×(2m+1) is a size of the first window or the second window. 18.The person tracking apparatus as claimed in claim 14, furthercomprising: a tracking object determining unit which obtains ahorizontal center of the current input image; and determines as atracking object a person closest to the center among the detectedpersons, when multiple persons are detected by the person detectingunit.
 19. A robot, comprising: an image input unit receiving an imageand outputting a captured image; a person detecting unit detecting aperson in the captured image using a skin color of the person; atracking object determining unit selecting a detected person in thecaptured image as a tracking object; and a tracking unit moving therobot a location near the observation object and tracking theobservation object at the location.
 20. The robot as claimed in claim19, wherein the tracking unit tracks the observation object using aparticle filter method.